Automation of berthing maneuvers in shipping is a pressing issue as the berthing maneuver is one of the most stressful tasks seafarers undertake. Berthing control problems are often tackled via tracking a predefined trajectory or path. Maintaining a tracking error of zero under an uncertain environment is impossible; the tracking controller is nonetheless required to bring vessels close to desired berths. The tracking controller must prioritize the avoidance of tracking errors that may cause collisions with obstacles. This paper proposes a training method based on reinforcement learning for a trajectory tracking controller that reduces the probability of collisions with static obstacles. Via numerical simulations, we show that the proposed method reduces the probability of collisions during berthing maneuvers. Furthermore, this paper shows the tracking performance in a model experiment.
translated by 谷歌翻译
In this study, we consider simulation-based worst-case optimization problems with continuous design variables and a finite scenario set. To reduce the number of simulations required and increase the number of restarts for better local optimum solutions, we propose a new approach referred to as adaptive scenario subset selection (AS3). The proposed approach subsamples a scenario subset as a support to construct the worst-case function in a given neighborhood, and we introduce such a scenario subset. Moreover, we develop a new optimization algorithm by combining AS3 and the covariance matrix adaptation evolution strategy (CMA-ES), denoted AS3-CMA-ES. At each algorithmic iteration, a subset of support scenarios is selected, and CMA-ES attempts to optimize the worst-case objective computed only through a subset of the scenarios. The proposed algorithm reduces the number of simulations required by executing simulations on only a scenario subset, rather than on all scenarios. In numerical experiments, we verified that AS3-CMA-ES is more efficient in terms of the number of simulations than the brute-force approach and a surrogate-assisted approach lq-CMA-ES when the ratio of the number of support scenarios to the total number of scenarios is relatively small. In addition, the usefulness of AS3-CMA-ES was evaluated for well placement optimization for carbon dioxide capture and storage (CCS). In comparison with the brute-force approach and lq-CMA-ES, AS3-CMA-ES was able to find better solutions because of more frequent restarts.
translated by 谷歌翻译
In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
translated by 谷歌翻译
进化策略(ES)是黑框连续优化的有前途的算法类别之一。尽管在应用方面取得了广泛的成功,但对其收敛速度的理论分析在凸二次函数及其单调转换方面受到限制。%从理论上讲,它在凸功能上的收敛速度速度仍然很模糊。在这项研究中,(1+1)-ES在本地$ l $ -l $ -lipschitz连续梯度上的上限和下限(1+1)-ES的线性收敛速率被推导为$ \ exp \左( - \ omega_ {d \ to \ infty} \ left(\ frac {l} {d \ cdot u} \ right)\ right)\ right)$ and $ \ exp \ left( - \ frac1d \ right)$。值得注意的是,对目标函数的数学特性(例如Lipschitz常数)的任何先验知识均未给出算法,而现有的无衍生化优化算法的现有分析则需要它们。
translated by 谷歌翻译
在多类分类模型的现实应用应用中,重要类中的错误分类(例如停止符号)可能比其他类别(例如速度限制)更有危害。在本文中,我们提出了一个损失函数,可以改善重要类别的回忆,同时使用跨透镜损失保持与情况相同的准确性。出于我们的目的,我们需要比其他班级更好地分离重要班级。但是,现有的方法对跨凝性损失造成较敏感的惩罚并不能改善分离。另一方面,给出特征向量与与每个特征相对应的最后一个完全连接层的重量向量之间的角度的方法可以改善分离。因此,我们提出了一个损失函数,可以通过仅设置重要类别的边缘来改善重要类别的分离,即称为类敏感的添加性角度损失(CAMRI损失)。预计CAMRI的损失将减少重要类的特征和权重之间的角度方差相对于其他类别,这是由于特征空间中重要类周围的边缘通过为角度增加惩罚而在特征空间中的边缘。此外,仅将惩罚集中在重要类别上几乎不会牺牲其他阶级的分离。在CIFAR-10,GTSRB和AWA2上进行的实验表明,所提出的方法可以在不牺牲准确性的情况下改善跨透镜损失的召回率提高了9%。
translated by 谷歌翻译
我们提出了一种依赖于大约解决最小化问题的orcacles的马鞍点优化方法。我们在强凸凹面上分析其收敛性,并向全球最大马鞍点显示线性趋同。根据收敛分析,我们开发了一种适应学习率的启发式方法。显示使用(1 + 1)-cma-es作为最小化Oracle的开发方法的实施方式,即普通话-CMA-es,优于几种现有的测试问题方法。数值评估证实了理论会聚速率的紧密性以及学习率适应机制的效率。作为实际问题的一个例子,建议的优化方法应用于模型不确定性下的自动停泊控制问题,显示其在获得解决方案到不确定性的解决方案中的用处。
translated by 谷歌翻译